improve gym env / tune hps #71

Armandpl · 2024-02-01T16:06:14Z

setup new reward function and sweep over them
- nothing very conclusive, each of those seem to converge in about the same time and lead to a working policy
  - slight doubt about the cos reward because one of the run converged to a bad policy. todo (later): run more runs comparing the cos reward to another one and look at the resulting policies.
- the idea was to use the exponential to have higher rewards when above pi/2 to try and get the agent to converge faster. It seems to be the case but the exp makes the reward more flat close to 0 which seems to make the early exploration slower.
setup tiny sweep over tqc hps:
- tiny sweep because each run takes ~10min so we can't sweep over that much hyper-parameters
- using sbx and jax could speed it up but I couldn't nicely setup jax cuda with poetry
- found that a higher learning rate converges faster, other hp didn't seem to have an effect. todo later: use rliable to properly eval the result of sweeps!
add raw angles to the obs when limits depend on them, so as to not break the markov assumption
add a pid to slow down the motor on reset, trying to gain some wall time during training. this is currently badly tuned and may have damaged the motor. need to investigate further
delete unused code: scripts/train_sac.pyand scripts/robot_inference.py
add DeadZone wrapper, ideally we'll get rid of it once we figure out why the agent doesn't explore properly without it
debug issue where it seemed gsde wasn't used with tqc on sbx
- we were updating the policy at each step which made it look like gaussian noise
setup training on mac
fix bug in FurutaReal env reset: was only working between -pi and pi, didn't take into account the pendulum could do multiple turn. it meant we were waiting until reset timeout when we didn't need to

…ay w/ dtype float32

… of tuning it

Armandpl added 30 commits January 23, 2024 14:23

use pid to speed up real env reset

9086aed

setup deadzone wrapper

01451f4

setup tqdc + deadzone sim exp

9e7d5fa

add angles to the obs when limiting them; add time feat wrapper

347befd

check for nan instead of none bc none gets converted to nan by np.arr…

cd9cf1c

…ay w/ dtype float32

setup real exp

5c4b6c4

add deadzone wrapper

e2d6360

use cos for env reset bc pendulum can do multiple turns

99a80dd

add a center to the deadzone wrapper to allow for a zero action

df6cf31

add exp reward

b5bdca6

setup tqc + exp reward sim experiment

3e00fae

add sbx and jax-metal

06116f7

fix base wrapper; make sim exp closer to real w/ n_envs=1

7c9f844

add timeout to motor stopping, hack to avoid pid oscillations instead…

22c5a43

… of tuning it

setup rew sweep

8a62cdc

investigate tqc action looking like gaussian noise

1b925e6

fix default args

d650c03

add max act to deadzone wrapper

0f6c7ad

setup hp tune sweep for tqc

5503027

actually use early stopping

5b7e11d

use sb3_contrib master to train

a46febd

use progressbar

e3d3761

add option to use either sbx or sbx tqc

2186f79

add new reward taking theta into account

df26804

remove unused code

f679592

make progressbar a param

722456e

setup new rew sweep

e751650

fix error in reward

2d76294

setup real training

5c933d0

fix sweep setup

9759dc8

Armandpl added 2 commits February 1, 2024 12:47

add theta reward

5986850

clear gsde notebook output

9070298

Armandpl linked an issue Feb 1, 2024 that may be closed by this pull request

sanity check by training with sac or tqc #64

Open

fix default reward key

bcda11d

Armandpl changed the title ~~improve gym env / tune hp~~ improve gym env / tune hps Feb 1, 2024

Armandpl merged commit f12a1da into master Feb 1, 2024
2 checks passed

Armandpl deleted the sanity-check-robot branch February 1, 2024 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve gym env / tune hps #71

improve gym env / tune hps #71

Armandpl commented Feb 1, 2024 •

edited

Loading

improve gym env / tune hps #71

improve gym env / tune hps #71

Conversation

Armandpl commented Feb 1, 2024 • edited Loading

Armandpl commented Feb 1, 2024 •

edited

Loading